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ABSTRACT 


Forty-eight  subjects  performed  a  short-term  memory  task  v/ith  several  difficulty  levels  and 
provided  either  immediate  or  delayed  ratings  of  workload  via  the  Subjective  Workload  Assessment 
Technique  (SWAT).  Mean  SWAT  ratings  did  not  vary  significantly  as  a  function  of  delayed  report,  but 
a  substantial  number  of  subjects  gave  delayed  ratings  that  were  discrepant  from  their  immediate 
ratings.  A  counterbalancing  effect  in  delayed  ratings  appears  to  have  been  a  factor  in  the  failure 
of  the  delay  effect  to  reach  significance.  A  secondary  objective  of  this  study  was  to  examine  the 
sensitivity  of  SWAT  in  a  between-subjects  design.  SWAT  ratings  varied  significantly  as  a  function 
of  task  difficulty  manipulations,  supporting  the  sensitivity  of  SWAT  to  the  workload  of  the  condi¬ 
tions  used. 


INTRODUCTION 

Subjective  techniques  have  been  used 
extensively  as  measures  of  operator  workload 
(e.g.,  Moray,  1982;  Williges  and  Wierwille, 
1979).  A  variety  of  different  techniques 
(e.g.,  magnitude  estimation,  paired  compari¬ 
sons)  have  been  applied  in  gathering  workload 
judgments,  but  the  rating  scale  is  the  most 
frequently  used  procedure,  especially  in  simu¬ 
lation  or  operational  environments.  The  wide¬ 
spread  use  of  rating  scales  can  be  attributed 
to  their  ease  of  implementation,  lack  of 
intrusiveness  on  operator  performance,  and 
high  degree  of  operator  acceptance. 

In  application-oriented  environments,  a 
question  exists  concerning  how  the  accuracy  of 
subjective  ratings  might  be  affected  when 
practical  constraints  require  a  delay  between 
task  performance  and  workload  estimation.  For 
example,  it  is  frequently  maintained  that  a 
pilot  or  operator  is  too  busy  during  peak 
workload  periods  to  complete  a  rating  scale, 
and  that  workload  reports  must  be  delayed 
until  the  opportunity  arises  to  complete 
them.  Since  subjective  ratings  depend  upon 
the  operator's  ability  to  remember  the  work¬ 
load  experienced  during  task  performance, 
delays  in  rating  scale  completion  constitute 
retention  intervals  for  the  information  which 
is  necessary  to  estimate  subjective  load. 
Although  the  current  short-term  memory  litera¬ 
ture  (e.g.,  Klatzky,  1980)  clearly  indicates 
that  some  loss  of  unrehearsed  information  will 
occur  at  relatively  short  retention  intervals 
(e.g.,  15  to  30  seconds),  little  data 
currently  exist  that  address  the  specific 
relationship  between  retention  interval  and 
the  accuracy  of  subjective  ratings  of 


workload.  Therefore,  the  major  purpose  of 
this  experiment  was  to  investigate  the  effect 
of  a  short  retention  interval  on  subjective 
ratings  of  workload. 

The  procedure  used  to  gather  the  subjec¬ 
tive  ratings  in  this  experiment  was  the 
Subjective  Workload  Assessment  Technique 
(SWAT).  In  SWAT  (e.g.,  Reid,  Shingledecker, 
and  Eggemeier,  1981;  Reid,  Shingledecker, 
Nygren,  and  Eggemeier,  1981;  Reid,  Eggemeier, 
and  Nygren,  1982),  subjective  workload  is 
defined  as  being  composed  of  three  dimen¬ 
sions:  (1)  time  load,  (2)  mental  effort  load, 

and  (3)  stress  load.  Each  dimension  is  repre¬ 
sented  by  an  individual  three-point  rating 
scale  with  descriptions  for  each  level  of 
load.  SWAT  is  based  on  conjoint  measurement 
and  scaling  (e.g.,  Krantz  and  Tversky,  1971; 
Nygren,  1982)  and  permits  ratings  on  the  three 
dimensions  to  be  combined  into  one  overall 
interval  scale  of  workload.  In  order  to  iden¬ 
tify  the  appropriate  rule  for  combining  the 
three  dimensions  into  one  overall  scale,  a 
scale  development  phase  is  completed.  During 
this  phase,  subjects  (^s )  rank  order  the  sub¬ 
jective  workload  associated  with  the  27  pos¬ 
sible  combinations  that  result  from  the  three 
levels  of  time,  mental  effort,  and  stress 
load.  After  completion  of  scale  development, 
an  event  scoring  phase  is  initiated.  During 
event  scoring,  Ss  [lerform  the  task(s)  of 
interest  and  rate  the  time,  mental  effort,  and 
stress  load  imposed  by  task  performance. 
Individual  ratings  on  tbp  thre"  dimensions  are 
then  converted  to  the  overall  interval  scale 
that  was  lierivpd  during  Mv'  scale  devei  opa-'en* 
phase.  ‘Core  detailed  discussions  rd  pon 
procedure  can  lie  found  in  Reid  et  ai.  :  I'-'-ila; 
idbe  i . 
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Previous  investigations  with  SWAT  have 
demonstrated  that  the  workload  ratings  were 
sensitive  to  variations  in  the  difficulty  of 
several  different  tasks,  including  simulated 
aircrew  radio  communications  and  critical 
tracking  (Reid  et  al.,  1981a),  short-term 
memory  (Eggemeier,  Crabtree,  Zingg,  Reid,  and 
Sh i ngl edecker,  1982),  and  probability  monitor¬ 
ing  (Notestine,  1983).  All  of  these  investi¬ 
gations  used  wi thi n-subjects  designs  to 
examine  SWAT  sensitivity.  This  type  of  design 
is  appropriate  for  evaluating  SWAT  sensitiv¬ 
ity,  since  many  applications  of  workload 
metrics  involve  wi thi n-subjects  designs!  For 
example,  the  same  group  of  test  pilots  will 
frequently  participate  in  all  conditions  of  a 
display  option  evaluation  conducted  in  a 
flight  simulator.  In  some  applications,  how¬ 
ever,  practical  constraints  may  make  it  impos¬ 
sible  for  the  same  group  to  participate  in  all 
phases  of  an  evaluation.  This  raises  a 
methodological  question  concerning  the  sensi¬ 
tivity  of  SWAT  to  task  difficulty  differences 
when  a  between-subjects  design  is  used. 
Therefore,  a  secondary  objective  of  this  study 
was  to  initially  investigate  the  between- 
subjects  design  question  as  it  pertains  to 
SWAT. 

Both  the  delayed  rating  and  between- 
subjects  design  issues  were  addressed  by 
examining  SWAT  ratings  that  were  completed  by 
^s  after  performing  a  short-term  memory  update 
task  (e.g.,  Monty,  Taub,  and  Laughery, 

1965).  This  task  required  that  ^s  update  and 
recall  the  status  of  several  categories  of 
information  which  changed  on  a  regular 
basis.  The  memory  update  task  was  chosen  for 
this  experiment  since  previous  research  had 
indicated  that  (1)  task  difficulty  could  be 
effectively  varied  by  manipulating  stimulus 
presentation  rate  (e.g.,  Monty  et  al.,  1965); 
and  (2)  SWAT  was  sensitive  to  such  manipula¬ 
tion  in  a  wi th i n-subjects  desiqn  (Eggemeier 
et  al.,  1982). 

METHOD 

Subjects .  ^s  were  48  introductory 
psychology  students  at  Wright  State  Univer¬ 
sity.  received  extra  course  credit  for 
their  participation  in  the  experiment. 

Apparatus.  Memory  stimulus  materials 
were  presented  on  a  12-inch  video  monitor 
which  was  driven  by  a  Commodore  VIC  20  com¬ 
puter.  ^s  were  seated  approximately  3  meters 
from  the  monitor. 

P rocedu  re ■  Categories  of  information 
used  in  the  memory  update  task  were  four 
letters  of  the  alphabet  (Q,  R,  S,  T)  which 
appeared  individually  for  500  msecs  on  the 
display.  The  memory  task  required  that  Ss 
keep  track  of  the  number  of  times  that  each 
letter  category  occurred  in  a  sequence.  These 
sequences  averaged  20  individual  letters  which 


were  distributed  across  the  four  categories. 

At  the  completion  of  a  letter  sequence,  recall 
instructions  were  presented  on  the  display  and 
^s  completed  an  answer  sheet.  Task  difficulty 
was  manipulated  by  varying  the  rate  of  letter 
presentation  (interstimulus  intervals  of  1.0, 
2.0,  and  3.0  seconds).  Presentation  rate  was 
a  between-subjects  variable,  with  16  ^s  per¬ 
forming  the  memory  task  at  each  rate. 

During  data  collection,  a  block  of  three 
trials  was  presented  to^s.  After  each  trial, 
^s  indicated  the  number  of  times  that  each 
letter  category  had  occurred  in  the 
sequence.  Absolute  error  in  recalling  the 
number  of  instances  of  each  category  served  as 
the  memory  performance  measure.  This  was  com¬ 
puted  by  determining  the  deviation  of  ^s  ' 
response  from  the  correct  number  for  each 
category,  and  summing  the  deviations.  The 
measure  was  absolute  in  that  no  distinction 
was  made  between  overestimates  and  underesti¬ 
mates  by  an  At  the  completion  of  a  block 
of  trials,  _^s  rated  the  subjective  workload  by 
completing  separate  three-point  ratings  on 
time,  mental  effort,  and  stress  load. 

Delay  of  ratings  was  a  wi thi n-subjects 
variable,  so  that  each  provided  a  SWAT 
rating  immediately  after  completion  of  a  block 
of  trials  and  also  after  a  15  minute  delay 
period.  Order  of  the  delay  interval  (0  versus 
15  minutes)  was  counterbalanced  such  that  one- 
half  of  the  _^s  completed  immediate  ratings 
first,  while  the  other  half  completed  the 
delayed  ratings  first.  The  former  _^s  per¬ 
formed  the  memory  update  task,  provided  their 
ratings,  and  were  given  a  15  minute  rest 
period.  After  the  rest  period,  ^s  performed  a 
memory  update  task  at  the  same  presentation 
rate  as  the  first  task,  played  a  video  game 
for  15  minutes,  and  then  completed  their  SWAT 
ratings  for  the  second  memory  task,  ^s  who 
completed  the  delayed  rating  first  followed 
the  same  procedure  in  the  reverse  sequence. 
Although  the  presentation  rate  in  both  memory 
tasks  was  the  same  for  each  group  of  ^s ,  the 
actual  sequences  of  letters  were  different  and 
were  counterbalanced  across  the  immediate  and 
delayed  rating  conditions.  One  purpose  of  the 
15  minute  rest  period  was  to  minimize  the 
likelihood  that  would  recognize  that  the 
presentation  rates  of  the  two  tasks  were  iden¬ 
tical.  ^s  were  not  informed  that  a  workload 
rating  would  he  required  in  the  delay  con¬ 
dition  until  the  rating  was  actually 
requested.  The  video  game  that  was  played 
during  the  15  minute  delay  required  the  use  of 
a  joystick  to  mane\jver  a  simulated  boat,  and 
was  predominantly  psychomotor  in  nature.  The 
game  did  not  specifically  require  retention  of 
verbal  information,  and  was  chosen  because  it 
was  dissimilar  to  the  memory  update  task.  T 
dissimilar  task  was  used  in  order  to  mioiriise 
interference  effects  and  provide  a  relative!'.- 
pure  estimate  of  the  effects  of  the  15  minute 
delay  on  workload  ratings. 


4.0 


Prior  to  actual  data  collection, 
received  practice  on  the  memory  update  task 
and  on  performing  SWAT  ratings.  During 
training,  all  _Ss  performed  three  blocks  of 
training  trials  with  presentation  rate/memory 
category  combinations  that  differed  from  those 
used  during  actual  data  collection.  The  com¬ 
binations  used  during  training  included: 

(1)  three  categories  at  a  4.0  second  rate, 

(2)  four  categories  at  a  2.5  second  rate,  and 

(3)  five  categories  at  a  1.0  second  rate.  It 
is,  therefore,  important  to  note  that  although 
actual  data  collection  was  conducted  under  a 
between-sub jects  design,  all  ^s  had  performed 
and  rated  the  same  group  of  practice  tasks. 

During  the  practice  session,  ^s  also 
completed  the  scale  development  phase  of 
SWAT.  Following  procedures  outlined  by  Reid 
et  al  .  (1981a;  1982),  interval  level  SWAT 
scales  with  ranges  of  0  to  100  were  derived 
for  use  as  the  subjective  workload  measures  in 
subsequent  analyses. 

RESULTS 

Memory  performance  data  were  analyzed 
using  a  two-factor  analysis  of  variance 
(ANOVA).  Three  levels  of  the  presentation 
rate  variable  (1.0,  2.0,  3.0  seconds)  and  two 
levels  of  the  rating  delay  variable  (0, 

15  minutes)  were  included  in  the  ANOVA.  A 
square  root  transformation,  designed  to  remove 
proportional  relationships  between  means  and 
variances  that  are  common  in  this  type  of 
error  data,  was  applied  prior  to  conducting 
the  ANOVA.  Figure  1  shows  the  mean  trans¬ 
formed  memory  error  scores  as  a  function  of 
presentation  rate  and  rating  delay  condi¬ 
tion.  As  is  clear  from  Figure  1,  neither 
presentation  rate  nor  rating  delay  had  a 
marked  effect  on  performance.  The  ANOVA  con¬ 
firmed  this,  and  indicated  that  the  main 
effects  of  presentation  rate  [F(2,45)  =  1.20, 
p  >  .25],  rating  delay  [F(l,45)  =  0.92,  p  > 
.25],  and  their  interaction  [f(2,45)  =  0.48, 
p  >  .25]  were  not  significant.  The  nonsig¬ 
nificant  effect  of  rating  delay  condition  was 
expected,  since  that  factor  simply  represented 
whether  the  workload  rating  completed  subse¬ 
quent  to  task  performance  was  immediate  or 
delayed.  Likewise,  there  was  no  reason  to 
anticipate  a  significant  interaction. 

Figure  2  shows  mean  overall  interval  SWAT 
ratings  as  a  function  of  presentation  rate  and 
number  of  memory  categories.  A  3  x  2  ANOVA 
performed  on  the  SWAT  data  indicated  that  the 
main  effect  of  presentation  rate  [F(2,45)  = 
8.14,  p  <  .01]  was  significant,  but  that  the 
main  effect  of  rating  delay  [F(l,45)  =  1.4R, 
p  <  .25]  and  the  interaction  [F(2,45)  =  0.62, 
p  >  .25]  were  not. 

Although  the  rating  delay  effect  was  not 
significant.  Figure  2  indicates  that  there  was 
some  tendency  for  mean  immediate  and  delayed 
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Figure  1.  Mean  Square  Root  of  Absolute 
Memory  Error  as  a  Function  of 
Stimulus  Presentation  Rate  and 
SWAT  Rating  Delay  Condition 

ratings  to  differ,  particularly  in  the 
2,0  second  presentation  rate  condition.  This 
tendency  is  supported  by  the  fact  that  31  of 
the  48  _Ss  assigned  ratings  under  the  delay 
condition  that  differed  from  their  immediate 
ratings.  Among  the  _Ss  who  showed  a  discrep¬ 
ancy  between  immediate  and  delayed  ratings,  20 
Ss  increased  their  ratings  in  the  delayed  con- 
"Jition,  while  11  ^s  decreased  their  delayed 
ratings.  This  trend  is  reflected  in  Figure  2, 
since  delayed  ratings  tend  to  be  higher  than 
immediate  ratings.  The  noted  pattern  of 
changes  also  suggests  the  existence  of  a  mild 
counterbalancing  effect,  where  approximately 
65  percent  of  the  ^s  increased  their  ratings, 
while  the  remainder  decreased  their  ratings. 
Such  a  counterbalancing  effect  represents  a 
potential  factor  in  the  lack  of  a  significant 
delay  effect  on  mean  ratings. 

Because  the  presentation  rate  effect  was 
significant,  a  Newman-Keuls  multiple  compari¬ 
sons  test  was  performed  in  order  to  specify 
the  locus  of  the  significant  effectis).  This 
test  indicated  that  SWAT  ratings  in  the 
3.0  second  contiition  differed  from  those  in 
the  2.0  second  (p  <  .05)  and  the  1.0  second 
(p  <  .0!)  conditions.  The  diMeronce  between 
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Figure  2.  Mean  SWAT  Ratings  as  a  Function 
of  Stimulus  Presentation  Rate 
and  Rating  Delay  Condition 

the  1.0  and  2.0  second  ratings  approached,  but 
did  not  reach  significance.  Therefore,  in  the 
present  between-subjects  design,  SWAT  ratings 
demonstrated  differences  in  the  workload 
associated  with  different  presentation  rates, 
even  though  the  primary  task  measure  of  memory 
errors  did  not. 


DISCUSSION 

The  results  indicate  that  mean  SWAT 
ratings  were  not  significantly  influenced  by 
the  15  minute  delay  interval  used  in  this 
experiment.  However,  a  substantial  number  of 
_Ss  did  give  delayed  ratings  that  differed  from 
their  immediate  ones,  suggesting  that  the 
delay  did  contribute  to  some  changes  in 
ratings.  It  is  probable  that  the  noted  coun¬ 
terbalancing  effect  of  increases  versus 
decreases  in  the  delayed  ratings  was  a  factor 
in  the  failure  to  find  a  significant  differ¬ 
ence  between  the  mean  ratings  in  the  two 
conditions.  If  it  is  assumed  that  loss  of 
information  from  short-term  memory  was  a  major 
contributor  to  the  individual  discrepancies 
between  immediate  and  delayed  ratings  that  did 
occur,  there  is  no  reason  to  expect  that  a 
bias  in  favor  of  either  increases  or  decreases 
should  have  been  present  in  the  data.  If 
information  lost  from  memory  during  the 
15  minute  delay  made  it  necessary  for  Ss  to 


guess  or  estimate  the  particular  levels  of 
load  that  had  been  experienced,  it  appears 
probable  that  some  ^s  would  increase  their 
ratings  relative  to  the  immediate  rating  base¬ 
line,  while  others  would  decrease  their 
ratings.  The  trend  toward  a  counterbalancing 
effect  that  was  apparent  in  the  data  can, 
therefore,  be  interpreted  as  consistent  with  a 
loss  from  short-term  memory  of  information 
that  is  necessary  to  complete  subjective 
ratings.  When  such  a  counterbalancing  effect 
is  present,  it  appears  that  delays  should  not 
significantly  alter  mean  ratings. 

The  present  results  are  also  consistent 
with  those  of  Notestine  (1983),  who  recently 
compared  immediate  and  delayed  SWAT  ratings 
resulting  from  performance  of  a  probability 
monitoring  task.  Delays  of  15  and  30  minutes 
had  no  significant  effect  on  mean  SWAT 
ratings,  but  a  number  of  showed  substantial 
discrepancies  between  immediate  and  delayed 
ratings.  The  results  of  the  Notestine  and  the 
current  experiment  are,  therefore,  quite 
similar. 

In  applying  the  results,  however,  it  is 
very  important  to  note  that  both  studies  were 
specifically  designed  to  test  the  effects  of 
delays  on  workload  ratings.  In  each  case,  a 
video  game  that  was  chosen  to  minimize  inter¬ 
ference  effects  was  performed  by  _Ss  during  the 
delay  interval.  As  noted  earlier,  one  reason 
for  delaying  workload  ratings  in  some  applica¬ 
tions  is  the  fact  that  the  operator  is  too 
busy  with  continuing  or  subsequent  task 
performance  to  complete  the  necessary 
ratings.  An  important  area  for  additional 
research,  therefore,  is  to  investigate  the 
effects  of  similar  intervening  tasks  on 
delayed  ratings.  It  is  well  established  in 
the  human  memory  literature  (e.g.,  Klatzky, 
1980)  that  such  retroactive  interference 
effects  are  an  important  determinant  of  for¬ 
getting,  and  that  the  degree  of  interference 
experienced  in  verbal  meinory  can  be  related  to 
the  similarity  of  the  remembered  and  interven¬ 
ing  material.  Since  neither  the  present  study 
nor  the  Notestine  experiment  was  designed  to 
address  the  retroactive  interference  issue, 
the  results  should  not  be  generalized  to  those 
instances  where  it  is  possible  that  such 
effects  may  be  present.  It  is  quite  possible 
that  such  interference  effects  could  introduce 
a  systematic  bias  into  the  ratings,  destroy 
any  counterbalancing  effect,  and  significantly 
influence  moan  ratings.  It  is  also  important 
to  note  that  the  current  results  pertain  only 
to  SWAT,  it  is  possible  that  other  rating 
scale  formats  (e.g.,  10-point  scale)  may  be 
more  or  less  resistant  to  the  effects  of  delay 
than  SWAT,  which  requires  that  ^s  assign  rela¬ 
tively  simple  thre.a-point  ratings  on  the 
dimensions  of  time,  monta!  effort,  and  stress 
1  oad . 


A  secondary  objective  of  this  study  was 
to  initially  examine  the  sensitivity  of  SWAT 
in  a  be  tween-subjects  design.  The  results 
indicated  that  SWAT  was  sensitive  to  varia¬ 
tions  in  presentation  rate  in  the  memory 
update  tasks,  and  that  the  ratings  were  more 
sensitive  to  such  variations  than  the  primary 
task  measure  of  memory  error.  This  type  of 
result  would  be  expected  from  a  sensitive 
measure  of  workload,  since  primary  task  meas¬ 
ures  such  as  memory  error  are  generally 
thought  to  discriminate  overload  from  nonover¬ 
load  conditions  (e.g.,  Williges  and  Uierwille, 
1979]  .^"  Apparently ,  the  difficulty  manipula¬ 
tions  used  in  present  study  were  in  a  nonover¬ 
load  region,  leading  to  a  lack  of  sensitivity 
of  the  primary  task  measure.  The  more  sensi¬ 
tive  subjective  technique,  on  the  other  hand, 
successfully  discriminated  several  levels  of 
the  variations  in  load  that  were  employed. 
These  results,  which  compare  favorably  with 
data  from  previous  withi n-subjects  work  with 
the  same  task  (Eggemeier  et  al.,  1982),  sup¬ 
port  the  conclusion  that  SWAT  can  be  a  sensi¬ 
tive  workload  index  in  a  between-subjects 
design.  Although  encouraging  in  this  respect, 
the  present  results  were  obtained  with  pre¬ 
training  by  all  S_s  on  a  common  set  of  task 
difficulty  levels,  and  with  a  relatively  large 
number  of  _Ss.  In  spite  of  the  fact  that  com¬ 
mon  pretraining  of  ^s  may  be  possible  in 
operational  applications  of  between-subjects 
designs,  an  important  topic  for  future 
research  deals  with  the  effectiveness  of 
between-subjects  designs  when  common  training 
is  not  provided.  A  direct  comparison  of 
within-  and  between-subject  SWAT  sensitivity 
in  the  same  task  difficulty  conditions  also 
represents  an  area  for  future  research. 
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